Knowledge-Conscious Exploratory Data Clustering?
نویسندگان
چکیده
We consider the problem of efficiently executing data clustering queries in a client-server setting. Specifically, we consider an environment in which the entire data set is housed on a server and a client is interested in interactively performing kMeans clustering on different subsets of this data set. Extant solutions to this problem suffer from (a) a significant amount of remote I/O and (b) minimal re-use of computation between both iterations of a kMeans query, and executions of different kMeans queries. We propose to facilitate interactive kMeans clustering by employing a client-side knowledge-cache. This knowledgecache is succinct and significantly reduces the amount of remote I/O needed during execution. Furthermore, it permits the re-use of computation, both within and between executions of the kMeans queries. Our experimental study shows that client-side knowledge caching can speed up execution by nearly an order of magnitude.
منابع مشابه
A clustering approach for mineral potential mapping: A deposit-scale porphyry copper exploration targeting
This work describes a knowledge-guided clustering approach for mineral potential mapping (MPM), by which the optimum number of clusters is derived form a knowledge-driven methodology through a concentration-area (C-A) multifractal analysis. To implement the proposed approach, a case study at the North Narbaghi region in the Saveh, Markazi province of Iran, was investigated to discover porphyry ...
متن کاملCollaborative and Knowledge-based Fuzzy Clustering
Clustering is commonly regarded as a synonym of unsupervised learning aimed at the discovery of structure in highly dimensional data. With a plethora of existing algorithms, the area offers an evident diversity of possible approaches along with their underlying features and potential applications. When augmented by fuzzy sets, fuzzy clustering has become an integral component of Computational I...
متن کاملA new knowledge-based constrained clustering approach: Theory and application in direct marketing
Clustering has always been an exploratory but critical step in the knowledge discovery process. Often unsupervised, the clustering task received a huge interest when reinforced by different kinds of inputs provided by the user. This paper presents an approach giving the possibility to incorporate business knowledge in order to guide the clustering algorithm. A formalization of the fact that an ...
متن کاملAmoeba: Hierarchical Clustering Based on Spatial Proximity Using Delaunaty Diagram
Exploratory data analysis is increasingly more necessary as larger spatial data is managed in electro-magnetic media. We propose an exploratory method that reveals a robust clustering hierarchy. Our approach uses the Delaunay diagram to incorporate spatial proximity. It does not require any prior knowledge about the data set, nor does it require parameters from the user. Multi-level clusters ar...
متن کاملCo-clustering of biological networks and gene expression data
MOTIVATION Large scale gene expression data are often analysed by clustering genes based on gene expression data alone, though a priori knowledge in the form of biological networks is available. The use of this additional information promises to improve exploratory analysis considerably. RESULTS We propose constructing a distance function which combines information from expression data and bi...
متن کامل